Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.769
Filtrar
1.
Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38471500

RESUMO

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.


Assuntos
Ácidos e Sais Biliares , Microbioma Gastrointestinal , Metabolômica , Espectrometria de Massas em Tandem , Animais , Humanos , Ácidos e Sais Biliares/química , Metabolômica/métodos , Poliaminas , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Compostos Químicos
2.
J Chem Inf Model ; 64(6): 1975-1983, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38483315

RESUMO

Most online chemical reaction databases are not publicly accessible or are fully downloadable. These databases tend to contain reactions in noncanonicalized formats and often lack comprehensive information regarding reaction pathways, intermediates, and byproducts. Within the few publicly available databases, reactions are typically stored in the form of unbalanced, overall transformations with minimal interpretability of the underlying chemistry. These limitations present significant obstacles to data-driven applications including the development of machine learning models. As an effort to overcome these challenges, we introduce PMechDB, a publicly accessible platform designed to curate, aggregate, and share polar chemical reaction data in the form of elementary reaction steps. Our initial version of PMechDB consists of over 100,000 such steps. In the PMechDB, all reactions are stored as canonicalized and balanced elementary steps, featuring accurate atom mapping and arrow-pushing mechanisms. As an online interactive database, PMechDB provides multiple interfaces that enable users to search, download, and upload chemical reactions. We anticipate that the public availability of PMechDB and its standardized data representation will prove beneficial for chemoinformatics research and education and the development of data-driven, interpretable models for predicting reactions and pathways. PMechDB platform is accessible online at https://deeprxn.ics.uci.edu/pmechdb.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados Factuais
3.
J Chem Inf Model ; 64(8): 2948-2954, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38488634

RESUMO

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.


Assuntos
Algoritmos , Software , Linguagens de Programação , Quimioinformática/métodos , Bases de Dados de Compostos Químicos
4.
J Chem Inf Model ; 64(4): 1158-1171, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38316125

RESUMO

Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.


Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Descoberta de Drogas/métodos
6.
SLAS Discov ; 29(2): 100144, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38316342

RESUMO

The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.


Assuntos
Algoritmos , Redes Neurais de Computação , Solubilidade , Consenso , Bases de Dados de Compostos Químicos
7.
J Biol Chem ; 300(2): 105624, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38176651

RESUMO

The glycosylation of proteins and lipids is known to be closely related to the mechanisms of various diseases such as influenza, cancer, and muscular dystrophy. Therefore, it has become clear that the analysis of post-translational modifications of proteins, including glycosylation, is important to accurately understand the functions of each protein molecule and the interactions among them. In order to conduct large-scale analyses more efficiently, it is essential to promote the accumulation, sharing, and reuse of experimental and analytical data in accordance with the FAIR (Findability, Accessibility, Interoperability, and Re-usability) data principles. However, a FAIR data repository for storing and sharing glycoconjugate information, including glycopeptides and glycoproteins, in a standardized format did not exist. Therefore, we have developed GlyComb (https://glycomb.glycosmos.org) as a new standardized data repository for glycoconjugate data. Currently, GlyComb can assign a unique identifier to a set of glycosylation information associated with a specific peptide sequence or UniProt ID. By standardizing glycoconjugate data via GlyComb identifiers and coordinating with existing web resources such as GlyTouCan and GlycoPOST, a comprehensive system for data submission and data sharing among researchers can be established. Here we introduce how GlyComb is able to integrate the variety of glycoconjugate data already registered in existing data repositories to obtain a better understanding of the available glycopeptides and glycoproteins, and their glycosylation patterns. We also explain how this system can serve as a foundation for a better understanding of glycan function.


Assuntos
Bases de Dados de Compostos Químicos , Glicômica , Proteômica , Glicopeptídeos/metabolismo , Glicoproteínas/metabolismo , Glicosilação , Polissacarídeos/metabolismo , Bases de Dados Genéticas
8.
BMC Complement Med Ther ; 24(1): 40, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38229051

RESUMO

BACKGROUND: As chromatographic techniques have advanced, many articles that analyze the constituting compounds of medicinal materials have been published in relation to Northeast Asian traditional medicine, including traditional Chinese medicine (TCM). TM-MC was launched in 2015, providing information about the chemical compounds in medicinal materials from chromatographic articles in PubMed. Since 2015, through continuous curation efforts, we have now released TM-MC 2.0 with significant improvements to the quantity and quality of the data ( https://tm-mc.kr ). DESCRIPTION: TM-MC 2.0 contains 635 medicinal materials, 34,107 chemical compounds (21,306 identified and de-duplicated), 13,992 targets, 27,997 diseases, and 5,075 prescriptions (2,393 de-duplicated by name). The database provides the largest number of identified compounds for medicinal materials listed in the pharmacopoeia compared to all TCM databases. In particular, marker compounds of medicinal materials and many newly discovered compounds were added through the manual curation of recent chromatographic articles. CONCLUSION: TM-MC 2.0 provides the largest collection of information about the chemical compounds of the medicinal materials listed in the Korean, Chinese, and Japanese pharmacopoeias. Our database can be utilized for network pharmacology in traditional medicine and for the compound screening of medicinal materials for modern drug discovery.


Assuntos
Bases de Dados de Compostos Químicos , Medicina Tradicional Chinesa , Medicina Tradicional , Bases de Dados Factuais
9.
Nucleic Acids Res ; 52(D1): D1355-D1364, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37930837

RESUMO

The metabolic roadmap of drugs (MRD) is a comprehensive atlas for understanding the stepwise and sequential metabolism of certain drug in living organisms. It plays a vital role in lead optimization, personalized medication, and ADMET research. The MRD consists of three main components: (i) the sequential catalyses of drug and its metabolites by different drug-metabolizing enzymes (DMEs), (ii) a comprehensive collection of metabolic reactions along the entire MRD and (iii) a systematic description on efficacy & toxicity for all metabolites of a studied drug. However, there is no database available for describing the comprehensive metabolic roadmaps of drugs. Therefore, in this study, a major update of INTEDE was conducted, which provided the stepwise & sequential metabolic roadmaps for a total of 4701 drugs, and a total of 22 165 metabolic reactions containing 1088 DMEs and 18 882 drug metabolites. Additionally, the INTEDE 2.0 labeled the pharmacological properties (pharmacological activity or toxicity) of metabolites and provided their structural information. Furthermore, 3717 drug metabolism relationships were supplemented (from 7338 to 11 055). All in all, INTEDE 2.0 is highly expected to attract broad interests from related research community and serve as an essential supplement to existing pharmaceutical/biological/chemical databases. INTEDE 2.0 can now be accessible freely without any login requirement at: http://idrblab.org/intede/.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados Factuais , Inativação Metabólica , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo
12.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3759-3771, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37812549

RESUMO

Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we investigate a novel molecular fingerprint Anonymous-FP that possesses abundant perception about the underlying interactions shaped in small, medium, and large-scale atom chains. In detail, the possible atom chains from each molecule are sampled and extended as anonymous atom chains using an anonymous encoding manner. After that, the molecular fingerprint Anonymous-FP is embedded into vectorial space in virtue of the Natural Language Processing technique PV-DBOW. Anonymous-FP is studied on molecular property identification via molecule classification experiments on a series of molecule databases and has shown valuable advantages such as less dependence on prior knowledge, rich information content, full structural significance, and high experimental performance. During the experimental verification, the scale of the atom chain or its anonymous pattern is found significant to the overall representation ability of Anonymous-FP. Generally, the typical scale r = 8 could enhance the molecule classification performance, and specifically, Anonymous-FP gains the classification accuracy to above 93% on all NCI datasets.


Assuntos
Quimioinformática , Bases de Dados de Compostos Químicos
13.
J Chromatogr A ; 1710: 464417, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37778098

RESUMO

Liquid chromatography-tandem with high-resolution mass spectrometry (LCHRMS) has proven challenging for annotating multiple small molecules within complex matrices due to the complexities of chemical structure and raw LCHRMS data, as well as limitations in previous literatures and reference spectra related to those molecules. In this study, we developed a molecular networking assisted automatic database screening (MN/auto-DBS) strategy to examine the combined effect of MS1 exact mass screening and MS2 similarity analysis. We compiled all previously reported compounds from the relevant literatures. With the development of a Python software, the in-house database (DB) was created by automatically calculating the m/z and data from experimental MS1 hits were rapid screened with DB. We then performed a feature-based molecular network analysis on the auto-MS2 data for supplementary identification of unreported compounds, including clustered FBMN and annotated GNPS compounds. Finally, the results from both strategies were merged and manually curated for correct structural assignment. To demonstrate the applicability of MN/auto-DBS, we selected the Huangqi-Danshen herb pair (HD), commonly used in prescriptions or patent medicines to treat diabetic nephropathy and cerebrovascular disease. A total of 223 compounds were annotated, including 65 molecules not previously reported in HD, such as aromatic polyketides, coumarins, and diarylheptanoids. Using MN/auto-DBS, we can profile and mine a wide range of complex matrices for potentially new compounds.


Assuntos
Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida , Bases de Dados de Compostos Químicos , Bases de Dados Factuais , Cromatografia Líquida de Alta Pressão/métodos
14.
J Comput Aided Mol Des ; 37(12): 735-754, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37804393

RESUMO

QSAR models capable of predicting biological, toxicity, and pharmacokinetic properties were widely used to search lead bioactive molecules in chemical databases. The dataset's preparation to build these models has a strong influence on the quality of the generated models, and sampling requires that the original dataset be divided into training (for model training) and test (for statistical evaluation) sets. This sampling can be done randomly or rationally, but the rational division is superior. In this paper, we present MASSA, a Python tool that can be used to automatically sample datasets by exploring the biological, physicochemical, and structural spaces of molecules using PCA, HCA, and K-modes. The proposed algorithm is very useful when the variables used for QSAR are not available or to construct multiple QSAR models with the same training and test sets, producing models with lower variability and better values for validation metrics. These results were obtained even when the descriptors used in the QSAR/QSPR were different from those used in the separation of training and test sets, indicating that this tool can be used to build models for more than one QSAR/QSPR technique. Finally, this tool also generates useful graphical representations that can provide insights into the data.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade , Bases de Dados de Compostos Químicos , Benchmarking
16.
Sci Rep ; 13(1): 13894, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37626099

RESUMO

Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoma in adults. This study aimed to determine the prognostic significance of endoplasmic reticulum (ER) stress-related genes in DLBCL. ER stress-related genes were obtained from the molecular signatures database. Gene expression data and clinical outcomes from the gene expression omnibus and TCGA datasets were collected, and differentially expressed genes (DEGs) were screened out. Gene ontology enrichment analysis, the kyoto encyclopaedia of genes and genomes pathway analysis, and geneset enrichment analysis were used to analyse the possible biological function of ER stress-related DEGs in DLBCL. Protein-protein interaction network construction using the STRING online and hub genes were identified by cytoHubba on Cytoscape software. The significant prognosis-related genes were screened, and the differential expression was validated. The immune microenvironment assessment of significant genes were evaluated. Next, the nomogram was built using univariate and multivariate Cox regression analysis. 26 ER stress-related DEGs were screened. Functional enrichment analysis showed them to be involved in the regulation of the endoplasmic reticulum mainly. NUPR1 and TRIB3 were identified as the most significant prognostic-related genes by comparison with the GSE10846, GSE11318, and TCGA datasets. NUPR1 was correlated with a good prognosis and immune infiltration in DLBCL; on the other hand, high expression of TRIB3 significantly correlated with a poor prognosis, which was an independent prognostic factor for DLBCL. In summary, we identified NUPR1 and TRIB3 as critical ER stress-related genes in DLBCL. NUPR1 might be involved in immune infiltration in DLBCL, and TRIB3 might serve as a potential therapeutic target and prognostic factor in DLBCL.


Assuntos
Linfoma Difuso de Grandes Células B , Adulto , Humanos , Prognóstico , Linfoma Difuso de Grandes Células B/genética , Nomogramas , Bases de Dados de Compostos Químicos , Estresse do Retículo Endoplasmático/genética , Microambiente Tumoral/genética
17.
Anal Chem ; 95(32): 11901-11907, 2023 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-37540774

RESUMO

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.


Assuntos
Bases de Dados de Compostos Químicos , Espectrometria de Massas em Tandem , Reprodutibilidade dos Testes , Metabolômica/métodos , Aprendizado de Máquina
18.
Nucleic Acids Res ; 51(W1): W154-W159, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37260078

RESUMO

DIANA-miRPath is an online miRNA analysis platform harnessing predicted or experimentally supported miRNA interactions towards the exploration of combined miRNA effects. In its latest version (v4.0, http://www.microrna.gr/miRPathv4), DIANA-miRPath breaks new ground by introducing the capacity to tailor its target-based miRNA functional analysis engine to specific biological and/or experimental contexts. Via a redesigned modular interface with rich interaction, annotation and parameterization options, users can now perform enrichment analysis on Gene Ontology (GO) terms, KEGG and REACTOME pathways, sets from Molecular Signatures Database (MSigDB) and PFAM. Included miRNA interaction sets are derived from state-of-the-art resources of experimentally supported (DIANA-TarBase v8.0, miRTarBase and microCLIP cell-type-specific interactions) or from in silico miRNA-target interactions (updated DIANA-microT-CDS and TargetScan predictions). Bulk and single-cell expression datasets from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx) and adult/fetal single-cell atlases are integrated and can be used to assess the expression of enriched term components across a wide range of states. A discrete module enabling enrichment analyses using CRISPR knock-out screen datasets enables the detection of selected miRNAs with potentially crucial roles within conditions under study. Notably, the option to upload custom interaction, term, expression and screen sets further expands the versatility of miRPath webserver.


Assuntos
MicroRNAs , Software , Comunicação Celular , Bases de Dados de Compostos Químicos , MicroRNAs/genética , MicroRNAs/metabolismo
19.
J Chem Inf Model ; 63(13): 3977-3982, 2023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-37358197

RESUMO

Here, we present MolBook UNIPI, freely available and user-friendly software specifically designed for medicinal chemists as a powerful tool for the easy management of virtual libraries of chemical compounds. With MolBook UNIPI, it is possible to create, store, handle, and share molecular databases in a very simple and intuitive way. The software allows users to rapidly generate libraries of bioactive ligands, building blocks, or commercial compounds by either manually creating single molecules or automatically importing compounds from public databases and pre-existing libraries. MolBook UNIPI databases can be enriched with all kinds of data and can be filtered based on molecular structures or properties, allowing the desired molecules, along with their structures and features, to be easily accessible in just a few clicks. Moreover, new molecular properties and potential toxicological effects of compounds can be rapidly and reliably predicted. Notably, all of these functions can be easily mastered even by inexperienced users, with no prior cheminformatics knowledge or programming skills, which makes MolBook UNIPI an invaluable tool for medicinal chemists. MolBook UNIPI can be downloaded free of charge from the project web page https://molbook.farm.unipi.it/.


Assuntos
Bases de Dados de Compostos Químicos , Software , Bases de Dados Factuais , Ligantes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...